Multi-resolution auditory scene analysis: robust speech recognition using pattern-matching from a noisy signal
نویسندگان
چکیده
Unlike automatic speech recognition systems, humans can understand speech when other competing sounds are present Although the theory of auditory scene analysis (ASA) may help to explain this ability, some perceptual experiments show fusion of the speech signal under circumstances in which ASA principles might be expected to cause segregation. We propose a model of multi-resolution ASA that uses both highand lowresolution representations of the auditory signal in parallel in order to resolve this conflict. The use of parallel representations reduces variability for pattern-matching while retaining the ability to identify and segregate low-level features of the signal. An important feature of the model is the assumption that features of the auditory signal are fused together unless there is good reason to segregate them. Speech is recognised by matching the low-resolution representation to previously learned speech templates without prior segregation of the signal into separate perceptual streams; this contrasts with the appr oach gener a l l y used by comput at i onal m odels of A S A . We describe an implementation of the multi-resolution model, using hidden Markov models, that illustrates the feasibility of this approach and achieves much higher identification performance than standard techniques used for computer recognition of speech mixed with other sounds.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملA case for multi-resolution auditory scene analysis
A commonly held view of auditory scene analysis is that complex auditory environments are segregated into separate perceptual streams using primitive cues that can be attended to separately. We argue that this view is inconsistent with the majority of perceptual data reported in the literature and propose an alternative model that is based on a primary, low resolution signal representation used...
متن کاملMissing data techniques for robust speech recognition
In noisy listening conditions, the information available on which to base speech recognition decisions is necessarily incomplete: some spectro-temporal regions are dominated by other sources. We report on the application of a variety of techniques for missing data in speech recognition. These techniques may be based on marginal distributions or on reconstruction of missing parts of the spectrum...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کامل